Goto

Collaborating Authors

 target dynamic


MOBODY: Model Based Off-Dynamics Offline Reinforcement Learning

Guo, Yihong, Yang, Yu, Xu, Pan, Liu, Anqi

arXiv.org Artificial Intelligence

We study off-dynamics offline reinforcement learning, where the goal is to learn a policy from offline source and limited target datasets with mismatched dynamics. Existing methods either penalize the reward or discard source transitions occurring in parts of the transition space with high dynamics shift. As a result, they optimize the policy using data from low-shift regions, limiting exploration of high-reward states in the target domain that do not fall within these regions. Consequently, such methods often fail when the dynamics shift is significant or the optimal trajectories lie outside the low-shift regions. To overcome this limitation, we propose MOBODY, a Model-Based Off-Dynamics Offline RL algorithm that optimizes a policy using learned target dynamics transitions to explore the target domain, rather than only being trained with the low dynamics-shift transitions. For the dynamics learning, built on the observation that achieving the same next state requires taking different actions in different domains, MOBODY employs separate action encoders for each domain to encode different actions to the shared latent space while sharing a unified representation of states and a common transition function. We further introduce a target Q-weighted behavior cloning loss in policy optimization to avoid out-of-distribution actions, which push the policy toward actions with high target-domain Q-values, rather than high source domain Q-values or uniformly imitating all actions in the offline dataset. We evaluate MOBODY on a wide range of MuJoCo and Adroit benchmarks, demonstrating that it outperforms state-of-the-art off-dynamics RL baselines as well as policy learning methods based on different dynamics learning baselines, with especially pronounced improvements in challenging scenarios where existing methods struggle.


Controlling dynamical systems into unseen target states using machine learning

Köglmayr, Daniel, Haluszczynski, Alexander, Räth, Christoph

arXiv.org Artificial Intelligence

We present a novel, model-free, and data-driven methodology for controlling complex dynamical systems into previously unseen target states, including those with significantly different and complex dynamics. Leveraging a parameter-aware realization of next-generation reservoir computing, our approach accurately predicts system behavior in unobserved parameter regimes, enabling control over transitions to arbitrary target states. Crucially, this includes states with dynamics that differ fundamentally from known regimes, such as shifts from periodic to intermittent or chaotic behavior. The method's parameter-awareness facilitates non-stationary control, ensuring smooth transitions between states. By extending the applicability of machine learning-based control mechanisms to previously inaccessible target dynamics, this methodology opens the door to transformative new applications while maintaining exceptional efficiency. Our results highlight reservoir computing as a powerful alternative to traditional methods for dynamic system control.


Off-dynamics Conditional Diffusion Planners

Ng, Wen Zheng Terence, Chen, Jianda, Zhang, Tianwei

arXiv.org Artificial Intelligence

Offline Reinforcement Learning (RL) offers an attractive alternative to interactive data acquisition by leveraging pre-existing datasets. However, its effectiveness hinges on the quantity and quality of the data samples. This work explores the use of more readily available, albeit off-dynamics datasets, to address the challenge of data scarcity in Offline RL. We propose a novel approach using conditional Diffusion Probabilistic Models (DPMs) to learn the joint distribution of the large-scale off-dynamics dataset and the limited target dataset. To enable the model to capture the underlying dynamics structure, we introduce two contexts for the conditional model: (1) a continuous dynamics score allows for partial overlap between trajectories from both datasets, providing the model with richer information; (2) an inverse-dynamics context guides the model to generate trajectories that adhere to the target environment's dynamic constraints. Empirical results demonstrate that our method significantly outperforms several strong baselines. Ablation studies further reveal the critical role of each dynamics context. Additionally, our model demonstrates that by modifying the context, we can interpolate between source and target dynamics, making it more robust to subtle shifts in the environment.


Data-Driven Approaches for Modelling Target Behaviour

Schlangen, Isabel, Brandenburger, André, Sun, Mengwei, Hopgood, James R.

arXiv.org Machine Learning

The performance of tracking algorithms strongly depends on the chosen model assumptions regarding the target dynamics. If there is a strong mismatch between the chosen model and the true object motion, the track quality may be poor or the track is easily lost. Still, the true dynamics might not be known a priori or it is too complex to be expressed in a tractable mathematical formulation. This paper provides a comparative study between three different methods that use machine learning to describe the underlying object motion based on training data. The first method builds on Gaussian Processes (GPs) for predicting the object motion, the second learns the parameters of an Interacting Multiple Model (IMM) filter and the third uses a Long Short-Term Memory (LSTM) network as a motion model. All methods are compared against an Extended Kalman Filter (EKF) with an analytic motion model as a benchmark and their respective strengths are highlighted in one simulated and two real-world scenarios.


Bridging Autoencoders and Dynamic Mode Decomposition for Reduced-order Modeling and Control of PDEs

Saha, Priyabrata, Mukhopadhyay, Saibal

arXiv.org Artificial Intelligence

Modeling and controlling complex spatiotemporal dynamical systems driven by partial differential equations (PDEs) often necessitate dimensionality reduction techniques to construct lower-order models for computational efficiency. This paper explores a deep autoencoding learning method for reduced-order modeling and control of dynamical systems governed by spatiotemporal PDEs. We first analytically show that an optimization objective for learning a linear autoencoding reduced-order model can be formulated to yield a solution closely resembling the result obtained through the dynamic mode decomposition with control algorithm. We then extend this linear autoencoding architecture to a deep autoencoding framework, enabling the development of a nonlinear reduced-order model. Furthermore, we leverage the learned reduced-order model to design controllers using stability-constrained deep neural networks. Numerical experiments are presented to validate the efficacy of our approach in both modeling and control using the example of a reaction-diffusion system.


OMPO: A Unified Framework for RL under Policy and Dynamics Shifts

Luo, Yu, Ji, Tianying, Sun, Fuchun, Zhang, Jianwei, Xu, Huazhe, Zhan, Xianyuan

arXiv.org Artificial Intelligence

Training reinforcement learning policies using environment interaction data collected from varying policies or dynamics presents a fundamental challenge. Existing works often overlook the distribution discrepancies induced by policy or dynamics shifts, or rely on specialized algorithms with task priors, thus often resulting in suboptimal policy performances and high learning variances. In this paper, we identify a unified strategy for online RL policy learning under diverse settings of policy and dynamics shifts: transition occupancy matching. In light of this, we introduce a surrogate policy learning objective by considering the transition occupancy discrepancies and then cast it into a tractable min-max optimization problem through dual reformulation. Our method, dubbed Occupancy-Matching Policy Optimization (OMPO), features a specialized actor-critic structure equipped with a distribution discriminator and a small-size local buffer. We conduct extensive experiments based on the OpenAI Gym, Meta-World, and Panda Robots environments, encompassing policy shifts under stationary and nonstationary dynamics, as well as domain adaption. The results demonstrate that OMPO outperforms the specialized baselines from different categories in all settings. We also find that OMPO exhibits particularly strong performance when combined with domain randomization, highlighting its potential in RL-based robotics applications


Reservoir Computing with Generalized Readout based on Generalized Synchronization

Ookubo, Akane, Inubushi, Masanobu

arXiv.org Artificial Intelligence

Reservoir computing is a machine learning framework that exploits nonlinear dynamics, exhibiting significant computational capabilities. One of the defining characteristics of reservoir computing is its low cost and straightforward training algorithm, i.e. only the readout, given by a linear combination of reservoir variables, is trained. Inspired by recent mathematical studies based on dynamical system theory, in particular generalized synchronization, we propose a novel reservoir computing framework with generalized readout, including a nonlinear combination of reservoir variables. The first crucial advantage of using the generalized readout is its mathematical basis for improving information processing capabilities. Secondly, it is still within a linear learning framework, which preserves the original strength of reservoir computing. In summary, the generalized readout is naturally derived from mathematical theory and allows the extraction of useful basis functions from reservoir dynamics without sacrificing simplicity. In a numerical study, we find that introducing the generalized readout leads to a significant improvement in accuracy and an unexpected enhancement in robustness for the short- and long-term prediction of Lorenz chaos, with a particular focus on how to harness low-dimensional reservoir dynamics. A novel way and its advantages for physical implementations of reservoir computing with generalized readout are briefly discussed.


Generation of Time-Varying Impedance Attacks Against Haptic Shared Control Steering Systems

Mohammadi, Alireza, Malik, Hafiz

arXiv.org Artificial Intelligence

The safety-critical nature of vehicle steering is one of the main motivations for exploring the space of possible cyber-physical attacks against the steering systems of modern vehicles. This paper investigates the adversarial capabilities for destabilizing the interaction dynamics between human drivers and vehicle haptic shared control (HSC) steering systems. In contrast to the conventional robotics literature, where the main objective is to render the human-automation interaction dynamics stable by ensuring passivity, this paper takes the exact opposite route. In particular, to investigate the damaging capabilities of a successful cyber-physical attack, this paper demonstrates that an attacker who targets the HSC steering system can destabilize the interaction dynamics between the human driver and the vehicle HSC steering system through synthesis of time-varying impedance profiles. Specifically, it is shown that the adversary can utilize a properly designed non-passive and time-varying adversarial impedance target dynamics, which are fed with a linear combination of the human driver and the steering column torques. Using these target dynamics, it is possible for the adversary to generate in real-time a reference angular command for the driver input device and the directional control steering assembly of the vehicle. Furthermore, it is shown that the adversary can make the steering wheel and the vehicle steering column angular positions to follow the reference command generated by the time-varying impedance target dynamics using proper adaptive control strategies. Numerical simulations demonstrate the effectiveness of such time-varying impedance attacks, which result in a non-passive and inherently unstable interaction between the driver and the HSC steering system.


Motion-Encoded Particle Swarm Optimization for Moving Target Search Using UAVs

Phung, Manh Duong, Ha, Quang Phuc

arXiv.org Artificial Intelligence

This paper presents a novel algorithm named the motion-encoded particle swarm optimization (MPSO) for finding a moving target with unmanned aerial vehicles (UAVs). From the Bayesian theory, the search problem can be converted to the optimization of a cost function that represents the probability of detecting the target. Here, the proposed MPSO is developed to solve that problem by encoding the search trajectory as a series of UAV motion paths evolving over the generation of particles in a PSO algorithm. This motion-encoded approach allows for preserving important properties of the swarm including the cognitive and social coherence, and thus resulting in better solutions. Results from extensive simulations with existing methods show that the proposed MPSO improves the detection performance by 24\% and time performance by 4.71 times compared to the original PSO, and moreover, also outperforms other state-of-the-art metaheuristic optimization algorithms including the artificial bee colony (ABC), ant colony optimization (ACO), genetic algorithm (GA), differential evolution (DE), and tree-seed algorithm (TSA) in most search scenarios. Experiments have been conducted with real UAVs in searching for a dynamic target in different scenarios to demonstrate MPSO merits in a practical application.


Learning recurrent dynamics in spiking networks

Kim, Christopher, Chow, Carson

arXiv.org Artificial Intelligence

Spiking activity of neurons engaged in learning and performing a task show complex spatiotemporal dynamics. While the output of recurrent network models can learn to perform various tasks, the possible range of recurrent dynamics that emerge after learning remains unknown. Here we show that modifying the recurrent connectivity with a recursive least squares algorithm provides sufficient flexibility for synaptic and spiking rate dynamics of spiking networks to produce a wide range of spatiotemporal activity. We apply the training method to learn arbitrary firing patterns, stabilize irregular spiking activity of a balanced network, and reproduce the heterogeneous spiking rate patterns of cortical neurons engaged in motor planning and movement. We identify sufficient conditions for successful learning, characterize two types of learning errors, and assess the network capacity. Our findings show that synaptically-coupled recurrent spiking networks possess a vast computational capability that can support the diverse activity patterns in the brain.